The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We present MEM: Multi-view Exploration Maximization for tackling complex visual control tasks. To the best of our knowledge, MEM is the first approach that combines multi-view representation learning and intrinsic reward-driven exploration in reinforcement learning (RL). More specifically, MEM first extracts the specific and shared information of multi-view observations to form high-quality features before performing RL on the learned features, enabling the agent to fully comprehend the environment and yield better actions. Furthermore, MEM transforms the multi-view features into intrinsic rewards based on entropy maximization to encourage exploration. As a result, MEM can significantly promote the sample-efficiency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space. We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games. Extensive simulation results demonstrate that MEM can achieve superior performance and outperform the benchmarking schemes with simple architecture and higher efficiency.
translated by 谷歌翻译
The generation of Chinese fonts has a wide range of applications. The currently predominated methods are mainly based on deep generative models, especially the generative adversarial networks (GANs). However, existing GAN-based models usually suffer from the well-known mode collapse problem. When mode collapse happens, the kind of GAN-based models will be failure to yield the correct fonts. To address this issue, we introduce a one-bit stroke encoding and a few-shot semi-supervised scheme (i.e., using a few paired data as semi-supervised information) to explore the local and global structure information of Chinese characters respectively, motivated by the intuition that strokes and characters directly embody certain local and global modes of Chinese characters. Based on these ideas, this paper proposes an effective model called \textit{StrokeGAN+}, which incorporates the stroke encoding and the few-shot semi-supervised scheme into the CycleGAN model. The effectiveness of the proposed model is demonstrated by amounts of experiments. Experimental results show that the mode collapse issue can be effectively alleviated by the introduced one-bit stroke encoding and few-shot semi-supervised training scheme, and that the proposed model outperforms the state-of-the-art models in fourteen font generation tasks in terms of four important evaluation metrics and the quality of generated characters. Besides CycleGAN, we also show that the proposed idea can be adapted to other existing models to improve their performance. The effectiveness of the proposed model for the zero-shot traditional Chinese font generation is also evaluated in this paper.
translated by 谷歌翻译
探索对于具有高维观察和稀疏奖励的复杂环境中的深度强化学习至关重要。为了解决这个问题,最新的方法旨在利用内在的奖励来改善勘探,例如基于新颖的探索和基于预测的探索。但是,许多固有的奖励模块需要复杂的结构和表示学习,从而导致了过度的计算复杂性和不稳定的性能。在本文中,我们提出了一种有益的情节访问差异(REVD),这是一种计算有效且量化的探索方法。更具体地说,REVD通过评估情节之间的基于R \'Enyi Divergence的访问差异来提供内在的奖励。为了进行有效的差异估计,使用随机定义状态编码器使用K-Nearest邻居估计器。最后,在Pybullet机器人环境和Atari游戏上测试了REVD。广泛的实验表明,REVD可以显着提高强化学习算法的样本效率,并优于基准测定方法。
translated by 谷歌翻译
在过去的十年中,AI AID毒品发现(AIDD)的计算方法和数据集策划的繁荣发展。但是,现实世界中的药物数据集经常表现出高度不平衡的分布,这在很大程度上被当前的文献忽略了,但可能会严重损害机器学习应用程序的公平性和概括。在这一观察结果的激励下,我们介绍了Imdrug,这是一个全面的基准标准,其开源python库由4个不平衡设置,11个AI-Ready数据集,54个学习任务和16种为不平衡学习量身定制的基线算法。它为涵盖广泛的药物发现管道(例如分子建模,药物靶标相互作用和逆合合成)的问题和解决方案提供了可访问且可定制的测试床。我们通过新的评估指标进行广泛的实证研究,以证明现有算法在数据不平衡情况下无法解决药物和药物挑战。我们认为,Imdrug为未来的研究和发展开辟了途径,在AIDD和深度不平衡学习的交集中对现实世界中的挑战开辟了道路。
translated by 谷歌翻译
对卷积神经网络(CNN)的知识蒸馏(KD)进行了广泛的研究,以提高小型模型的性能。最近,Vision Transformer(VIT)在许多计算机视觉任务上取得了巨大的成功,而VIT的KD也需要实现。但是,除了基于输出logit的KD之外,由于巨大的结构间隙,其他基于特征的CNN基于特征的KD方法不能直接应用于VIT。在本文中,我们探讨了对VIT的基于特征的蒸馏方式。根据VIT中特征地图的性质,我们设计了一系列受控的实验,并为VIT特征蒸馏提供了三个实用指南。我们的一些发现甚至与CNN时代的实践相反。根据三个准则,我们提出了基于功能的方法Vitkd,从而为学生带来一致且相当大的改进。在ImagEnet-1K上,我们将DEIT微型从74.42%提高到76.06%,Deit-Small从80.55%提高到81.95%,而Deit-Base则从81.76%升至83.46%。此外,Vitkd和基于Logit的KD方法是互补的,可以直接使用。这种组合可以进一步提高学生的表现。具体而言,学生DEIT微小,小和基础分别达到77.78%,83.59%和85.41%。该代码可在https://github.com/yzd-v/cls_kd上找到。
translated by 谷歌翻译
本文提出了一个简单而有效的框架蒙版,该框架将新提出的掩盖自distillation纳入对比的语言图像预处理中。掩盖自distillation的核心思想是将表示从完整的图像提取到蒙版图像预测的表示形式。这种合并享有两个重要的好处。首先,掩盖的自我验证目标是本地贴片表示学习,这与视觉对比度的互补,专注于与文本相关的表示。二,掩盖的自我验证也与视觉语言对比符合训练目标的视野对比是一致的。视觉编码器用于功能对齐,因此能够学习本地语义从该语言中获得间接监督。我们提供了专门设计的实验,并进行了全面的分析,以验证这两个好处。从经验上讲,我们表明,当MaskClip应用于各种具有挑战性的下游任务时,可以在线性探测,填充和零拍摄中取得卓越的结果,并在语言编码器的指导下取得了卓越的结果。
translated by 谷歌翻译
随着深度卷积神经网络的兴起,对象检测在过去几年中取得了突出的进步。但是,这种繁荣无法掩盖小物体检测(SOD)的不令人满意的情况,这是计算机视觉中臭名昭著的挑战性任务之一,这是由于视觉外观不佳和由小目标的内在结构引起的嘈杂表示。此外,用于基准小对象检测方法基准测试的大规模数据集仍然是瓶颈。在本文中,我们首先对小物体检测进行了详尽的审查。然后,为了催化SOD的发展,我们分别构建了两个大规模的小物体检测数据集(SODA),SODA-D和SODA-A,分别集中在驾驶和空中场景上。 SODA-D包括24704个高质量的交通图像和277596个9个类别的实例。对于苏打水,我们收集2510个高分辨率航空图像,并在9个类别上注释800203实例。众所周知,拟议的数据集是有史以来首次尝试使用针对多类SOD量身定制的大量注释实例进行大规模基准测试。最后,我们评估主流方法在苏打水上的性能。我们预计发布的基准可以促进SOD的发展,并产生该领域的更多突破。数据集和代码将很快在:\ url {https://shaunyuan22.github.io/soda}上。
translated by 谷歌翻译
人们说:“一张照片值一千字”。那么,我们如何从图像中获取丰富的信息?我们认为,通过使用视觉线索来桥接大型的识别视觉基础模型和语言模型,我们可以无需任何额外的跨模式训练。得益于基础模型的强大零拍功能,我们首先构建图像的丰富语义表示(例如,图像标签,对象属性 /位置,字幕)作为结构化的文本提示,称为视觉线索,使用视觉基础模型。基于视觉线索,我们使用大型语言模型为视觉内容生成一系列综合描述,然后再次通过视觉模型验证,以选择与图像最合适的候选人。我们通过定量和定性测量评估生成的描述的质量。结果证明了这种结构化语义表示的有效性。
translated by 谷歌翻译
动态作业车间调度问题(DJSP)是一类是专门考虑固有的不确定性,如切换顺序要求和现实的智能制造的设置可能机器故障调度任务。因为传统方法不能动态生成环境的扰动面有效调度策略,我们制定DJSP马尔可夫决策过程(MDP)通过强化学习(RL)加以解决。为此,我们提出了一个灵活的混合架构,采用析取图的状态和一组通用的调度规则与之前最小的领域知识的行动空间。注意机制被用作状态的特征提取的图形表示学习(GRL)模块,并且采用双决斗深Q-网络与优先重放和嘈杂的网络(D3QPN)到每个状态映射到最适当的调度规则。此外,我们提出Gymjsp,基于众所周知的或图书馆公共标杆,提供了RL和DJSP研究社区标准化现成的现成工具。各种DJSP实例综合实验证实,我们提出的框架是优于基准算法可在所有情况下,较小的完工时间,并提供了在混合架构的各个组成部分的有效性实证理由。
translated by 谷歌翻译